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Abstract 

This paper is based on a lecture given in the LACONEU summer 
school, Valparaiso, January 2012. We introduce Gibbs distribution in a 
general setting, including non stationary dynamics, and present then three 
examples of such Gibbs distributions, in the context of neural networks 
spike train statistics: (i) Maximum entropy model with spatio-temporal 
constraints; (ii) Generalized Linear Models; (iii) Conductance based Inte- 
grate and Fire model with chemical synapses and gap junctions. 

Keywords Neural networks dynamics; spike train statistics; Gibbs distribu- 
tions. 

1 Introduction 

Neurons communicate among them by generating action potentials or "spikes" 
which arc pulses of electrical activity. When submitted to external stimuli, 
sensory neurons produce sequences of spikes or "spike trains" constituting a 
collective response and a dynamical way to encode information about those 
stimuli. However, neural responses are typically not exactly reproducible, even 
for repeated presentation of a fixed stimulus. Therefore, characterizing the re- 
lationship between sensory stimuli and neural spike responses can be framed as 
a problem of determining the most adequate probability distribution relating 
a stimulus to its neural response. There exist several attempts to infer this 
probability from data and / or general principles, based on Poisson or more 
general point processes [TJ [TBI 155 , Bayesian approaches [25[ 123] , maximum en- 
tropy [471 155] (for a review see [43) ) . In this paper we present several situations 
where the notion of Gibbs distributions is appropriate to address this problem. 

The concept of Gibbs distribution comes from statistical physics. We use it 
here in a more general sense than the one usually taught in standard physics 
courses, although it is part of mathematical statistical physics [22]. We argue 
here that Gibbs distributions might be canonical models for spike train statistics 
analysis. This statement is based on three prominent examples. 
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1. The so-called Maximum Entropy Principle allows one to propose spike 
train statistics models considering restrictions based on empirical obser- 
vations. Although this approach has been initially devoted to show the 
role of weak instantaneous pairwise correlations in the retina |47) . it has 
been recently applied to investigate the role of more complex events such 
as instantaneous triplets [H] or spatio-temporal events [55] • Probabil- 
ity distributions arising from the Maximum Entropy Principle are Gibbs 
distributions. 

2. Other approaches such as the Linear-Non Linear (LN) or Generalized Lin- 
ear Models (GLM) propose an ad hoc form for the conditional probability 
that a neuron fires given the past network activity and given the stimu- 
lus. Those models have been proven quite efficient for retina spike trains 
analysis |41j . They are not limited by the constraint of stationarity, but 
they are based on a questionable assumption of conditional independence 
between neurons. As we show, the probability distributions coming out 
from those models are also Gibbs distributions. 

3. Recent investigations on neural networks models (conductance based integrate- 
and-fire (IF) with chemical and electric synapses) show that statistics of 
spike trains generated by these models arc Gibbs distributions reducing to 

1 when dynamics is stationary, and reducing to 2 in specific cases [71l51ll4j. 
In the general case, the spike trains produced by these models have Gibbs 
distributions which neither match 1 nor 2. 

The paper is organized as follows. After some definitions regarding spike 
train statistics and a presentation of Gibbs distributions we develop these three 
examples, with a short discussion of their advantages and drawbacks in spike 
trains analysis. Then, we discuss some relations between these models, mainly 
based on the Hammersley-Clifford theorem [Ml [3J [Ml I3H]- This paper is a 
summary of several papers written by the authors and other collaborators 
[HI [Ml HI Hi]- As such it does not contain original material (except the 
presentation). 

2 Definitions 

2.1 Spike trains 

We consider a network of N neurons. We assume that there is a minimal 
time scale 5 > corresponding to the minimal resolution of the spike time, 
constrained by biophysics and by measurements methods (typically S ~ 1 ms) 
[Til ITU] . Without loss of generality (change of time units) we set 5 = 1, so 
that spikes are recorded at integer times. One then associates to each neuron 
k and each integer time n a variable w&(n) = 1 if neuron k fires at time n 

and cjfc(n) = otherwise. A spiking pattern is a vector u)(n) d = [^k(n)]^! =1 
which tells us which neurons are firing at time n. We note A = { 0, 1 } N the 
set of spiking patterns. A spike block is a finite ordered list of spiking patterns, 
written: 

<\ = { W (™)}{ ni < n <„ 2 }' 
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where spike times have been prescribed between the times n\ to n% (i.e. , 
Ti2 — Tii + 1 time steps). The range of a block is ni — ri\ + 1, the number of 
time steps from n\ to n-2- The set of such blocks is A n2 ~ ni+1 . Thus, there 
are 2 Nn possible blocks with N neurons and range n. We call a raster plot a 
bi-infmite sequence ui = f {^(n)}^^^, of spiking patterns. Obviously exper- 
imental rasters are finite, but the consideration of infinite sequences is more 
convenient mathematically. The set of raster plots is denoted fl = A^. 

2.2 Transition probabilities 

The probability that a neuron emits a spike at some time n depends on the 
history of the neural network. However, it is impossible to know explicitly its 
form in the general case since it depends on the past evolution of all variables 
determining the neural network state. A possible simplification is to consider 
that this probability depends only on the spikes emitted in the past by the 
network. In this way, we are seeking a family of transition probabilities of the 
form P„[o;(n) jw^I^], the probability that the firing pattern ui{n) occurs at 
time n, given a past spiking sequence oj™Ze>- Here, D is the memory depth of the 
probability, i.e., how far in the past does the transition probability depend on the 
past spike sequence. We use the convention that P n [w(n) | oj„Z]j} = P n [w(n)] 
if D = (memory-less case). 

The index n of P„[. | .] indicates that transition probabilities depend explic- 
itly on the time n. We say that those transition probabilities are time-translation 
invariant or stationary if for all n, P n [w(?i) | = Pd[w(I>) | uj^ 1 ] when- 

ever = u>Q~ l (i.e the probability does not depend explicitely on time). In 

this case we drop the index n. 

Transition probabilities depend on the neural network characteristics such 
as neurons conductances, synaptic responses or external currents. They give 
information about the dynamics that takes place in the observed neural net- 
work. Especially, they have a causal structure where the probability of an event 
depends on the past. This reflects underlying biophysical mechanisms in the 
neural network, which are also causal. 

2.3 Gibbs distribution 

We define here Gibbs distributions (or Gibbs measures) in a more general setting 
that the one usually taught in statistical physics courses, where Gibbs distribu- 
tions are considered in the realm of stationary process and maximum entropy 
principle. Here, we do not assume stationarity and the definition encompasses 
the maximum entropy distributions. The Gibbs distributions considered here 
are called chains with complete connections in the realm of stochastic processes 
[THIGH] and g-measures in ergodic theory [57] • They are also studied in mathe- 
matical statistical physics [22] . 

2.3.1 Continuity with respect to a raster 

For n G %, we note -A™"^ the set of sequences w™^ 1 . Assume that we are given 
a set of transitions probabilities, like in the previous section, possibly depending 
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on an infinite pasfl i.e. of the form P„[w(n) | w™^ 1 ]. We give in section [5751 an 
example of neural network model where such transition probabilities with an 
infinite memory do occur. 

Even if transition probabilities involve an infinite memory w"^ 1 , it is rea- 
sonable to consider situations where the effects of past spikes decreases expo- 
nentially with their distance in the past. This corresponds to the mathematical 
notion of continuity with respect to a raster. We note, for n G 7L, m > 0, and r 
integer: 

uj = lj if w(r) = ui (r), Vr £ {n — m, . . . , n } . 

Consider a function / depending both on discrete time n and on the raster part 
of u> anterior to n. We write f(n,u)) instead of /(rijO;™^ 1 ). The function / is 
continuous with respect to the raster lu if its m- variation: 

var m [f(n, .)] := sup j | f(n, u) - f(n, : u J } (1) 

tends to as m — > +oo. This precisely means that the effect, on the value of / 
at time n, as this change is more distant in the past. 

2.3.2 Gibbs distribution 

Definition 2.1 A Gibbs distribution is a probability measure /i : Q — > [0,1] 
such that: 

(i) for all n S 7L and all J r <„-measurable functions /: 




(ii) Vn g 2, V^ 1 6 A n _^, F n [u(n) \ oj^} > 0. 
(hi) For each n £ Z, P„[w(n) | w"^ 1 ] is continuous with respect to u). 

The condition (i) is a natural extension of the condition defining the invariant 
probability of an homogeneous Markov chain (see eq. (|2|) next section). In its 
most general sense (i) does not require stationarity and affords the consideration 
of an infinite memory. It defines so-called compatibility conditions. They state 
that the average of a function f(n,ui) with respect to /i, at time n (left hand 
side) , is equal to the average computed from transition probabilities (right hand 
side). This equality must hold for any time n. 

There exist several theorems guaranteeing the existence and uniqueness of a 
Gibbs distribution [TS] : this holds if the variation of transition probability 
decays sufficiently fast with time (typically exponentially) as n — m — > — oo. 

-^n this case, one has to assume that (i) for every to(n) S A , P n [u>(n) .] is measur- 
able with respect to J r <„_i, the sigma-algebra on A 7 ^^; (ii) for every £ oo ' 
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2.4 Markov chains 



Straightforward examples of Gibbs distributions defined that way are provided 
by Markov chains with positive transition probabilities. Recall that a Markov 
chain of length D is defined by a set of transition probabilities P„[w(n) | co^Zjj ] 
where the memory depth D > is finite. These transition probabilities are 
obviously continuous with respect to uj. If we assume moreover that they are 
strictly positive Vn 6 2, Vw™^. 1 6 A"L^ then they match (ii) in the definition 
above. Finally, in this case, (i) is equivalent to the following property. For any 
time n\ , n?, n% — ri\ > D: 

n[<]= n wokilm^ -1 ]- ( 2 ) 

l=m+D 

For any times n\,ri2 as above, the Gibbs-probability /i ] is given bjH the 
product of the Gibbs probability of the "initial block" fj, [w^j +£>_1 ] and the 
products of transition probabilities from the initial time n\ +D to the last time 
n 2 . 

Here we have considered transition probabilities depending explicitly on 
time n. When they are time-translation invariant (homogeneous Markov chain) 
the definition (2.1) is the definition of the unique invariant distribution of the 
Markov chain (it is unique because we have assumed positive transition proba- 
bilities). 

Let us now state © in a different form. Define: 

</>n(n,u) :=logP n [w(n) (3) 
called a normalized Gibbs potential. Then, ([2]) can be stated using: 

"2 

/i[<|<+ 1J - 1 ] =cxp J2 (4) 

i=m+D 

This form reminds the Gibbs distribution on spin lattices in statistical physics 
where one looks for lattice translation-invariant probability distributions given 
specific boundary conditions. Given a potential of range D the probability of a 
spin block depends on the states of spins in a neighborhood of size D of that 
block. Thus, the conditional probability of this block given a fixed neighbor- 
hood is the exponential of the energy characterizing physical interactions within 
the block as well as with the boundaries. Here, spins are replaced by spiking 
patterns; space is replaced with time which is mono-dimensional and oriented: 
there is no dependence in the future. Boundary conditions are replaced by the 
dependence in the past. 

The definition © of the normalized Gibbs potential extends to the case 
D -> +oo. 

2 One also says that fi is compatible with the set of transition probabilities. 
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3 Gibbs distributions and models of spike train 
statistics 



In this section we review several examples of models/concepts used to analyze 
spike train statistics. All of them enter in the realm of Gibbs distributions 
defined above. 

3.1 Maximum entropy models 

The definition (2.1) affords time-dependent transition probabilities. On the 
opposite, in this section we assume that they do not depend explicitly on n, or, 
equivalently, that they are time-translation invariant. This corresponds to the 
physical concept of stationarity. 

Assume that spike trains statistics is distributed according to an hidden 
probability \i. How to approach \x from data ? Maximum entropy provides a 
method that allows to approach fi. It selects among all the probability distribu- 
tions consistent with empirical data constraints, the most random i.e. the one 
with the highest entropy. But, why should we choose the maximum entropy dis- 
tribution? The answer is that since entropy is a measure of information, then 
one should choose the probability that includes the least amount of informa- 
tion we have about the system and no more. The result probability is a Gibbs 
distribution. 

3.1.1 Entropy 

We define the entropy rate (or Kolmogorov-Sinai entropy) of a probability /z £ 
A4i nv the set of time-translation invariant probability measures as: 

h[n] = - limsup — — - V /z[a#] log/i[u#], (5) 

n^oo n + 1 

where the sum holds over all possible blocks uiq. Note, that in the case of a 
Markov chain h[fx] also reads [TS] : 

h[ft] = -$>K] p W) K^llogPKD) Iwf" 1 ], (6) 
Finally, when D = 0, h[fi] reduces to the usual definition: 

MM) = -X>Mo)]iogMMo)]- (7) 

We used here the notation h((j,) instead of S or s, used in statistical physics. 
This is the conventional notation in crgodic theory for the (Kolmogorov-Sinai) 
entropy where the dependence on the measure fi is made explicit. 

3.1.2 Observables 

We call observable a function: 
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0:il -> {0,1}, 

r 

uj ^ \uk u { n u) (8) 

u=l 

i.e. a product of binary spike events where k u is a neuron index and n u a 
time index, with u = 1, ...,r, for some integer r > 0. Typical choices of 
observables are ujk x {n\) which is 1 if neuron k\ fires at time n\ and is otherwise; 
Wfei Wfc 2 {n-i) which is 1 if neuron k\ fires at time n\ and neuron ki fires at 
time n.2 and is otherwise. Another example is uj^[ji\) (1 — uj^iji-j)) which is 
1 is neuron k\ fires at time n\ and neuron ki is silent at time U2- This example 
emphasizes that observables arc able to consider events where some neurons are 
silent. 

We say that an observable O has range R if it depends on R consecutive 
spike patterns, e.g. 0{uj) = Oioj^ 1 ). We consider here that observables do not 
depend explicitly on time {time-translation invariance of observables) . As a con- 
sequence, for any time n, = C(o;™ +fl_1 ) whenever uJq^ 1 = 

3.1.3 Potential 

A function of the form: 



Up : n -> R, 

N 

uj ^ Y.^ k<Dk - ( 9 ) 

fe=i 

is called a potential, where the coefficients (3k are finitc| real numbers. The 
range of the potential is the maximum of the range of the observables Ok- 

3.1.4 Variational principle 

Fix a potential Up as in @. Assume that it has finite range D. 

In this case, a Gibbs distribution (i obeys the following variational principle: 

V\Up\= sup {h\v\ + v[Up]) =h[fx] + n[Up], (10) 

where V [ Up ] is called the topological pressure, v [ Up ] = Ysk=i Pkv[Ok] is the 
average value of Up with respect to the probability v and M.i nv is the set of 
time-translation invariant probability measures on fi. We use the notation v(f) 
for the average of a function / instead of < / > used in statistical physics or 
E t/ (/) used in probability theory. Note that Observables and Gibbs potentials 
are random functions that acts on the set of raster plots f2. 

Looking at the second equality, the variational principle (fTU|) selects, among 
all possible probabilities v, a unique one, the Gibbs distribution, realizing the 

3 Thus, we do not consider here hard core potentials with forbidden configurations. 
4 The variational principle still holds if the range is infinite and its variation JTJ decays 
sufficiently fast with m, typically exponentially 46 3 12 . 
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supremum. A variant of this principle holds when the average value of observ- 
ables Ok is constrained to a value Ck, fixed e.g. by experimental observations. 
In this case v [Hp ] becomes Ylk=i PkCk if the average value of all observables 
Ok is constrained. In this case the variational principle (|10[) reduces to maximiz- 
ing the entropy on the set of measures v £ Mi nv such that v [ Ok ] = Ck- Then, 
one is lead to a classical Lagrange multipliers problem where the /3fcS are the 
Lagrange multipliers. This is the classical approach introduced by Jaynes [25] . 
In this setting (|10p signifies: "maximizing the entropy given the information 
that we have of the system" i.e. the observed average value of the observables 
Ok is C k . 



3.1.5 Topological pressure 

The topological pressure is the formal analogue of free energy density. It has 
the following properties: 

• V [Hp } is a log generating function of cumulants. We have: 



Opk 

and 

d 2 V [Hp ] d»[O k ] ^ 

n— 

where C 0k o, (n) 



C 0k o l (n) = n[O k O l oa n ] - fi[Ok}^[0,}, 

is the correlation function between the two observables Ok and Oi at 
time n and a is the time shift operator. Note that correlation func- 
tions decay exponentially fast whenever Hp has finite range. So that 

En=0 C O k O l ( n ) < +°°- 

Eq. (fTZj) characterizes the variation in the average value of Ok when 
varying /3; (linear response). The corresponding matrix is a susceptibility 
matrix. It controls the Gaussian fluctuations of observables around their 
mean (central limit theorem) ESI H2] • 

• V(Hp) is a convex function of f3. 

• Define: 

The topological pressure obeys: 

T(Hp)= lim -logZ„, 

n— >+oo n 

and is analogous to a thermodynamic potential density (free energy, free 
enthalpy, pressure). 
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Remark 1 For D > one cannot write the Gibbs distribution in the form: 



v[u%] = ±-e n 'W\ (14) 

An 

It only obeys: 3A, B > such that, for any block Wq 

A < mKJ < p 

This is actually the definition of Gibbs distributions in ergodic theory [T^] . 
3.1.6 Markov chain 

The choice of the potential (|9|), i.e. the choice of a set of observables, fixes the 
restrictions for the statistical model. A normalization procedure allows to find a 
normalized potential <p equivalent to Tip from which the transition probabilities 
arc constructed. This defines an homogeneous Markov chain whose invariant 
measure is the Gibbs distribution associated with Hp. It is constructed as 
follows. 



Transition matrix 

Consider two spike blocks w\ , u>2 of range D > 1. The transition w\ — > u>2 is 
legal if w\ has the form ^(O)^^ -1 and W2 has the form ui^ 1 ui{D). The vectors 
w(0), to(D) are arbitrary but the block wf -1 is common. Here is an example of 
a legal transition : 

wi = [ s ; i ];«* = [ ; ; i }■ 

Here is an example of a forbidden transition 

toi = [ s ? ; ];«* = [ s ; ; ]. 

Any block Wq* of range i? = D + 1 can be viewed as a legal transition from 
the block w\ = uJq^ 1 to the block u>2 = cjf and in this case we write u)q ~ W1W2. 



The transfer matrix C is defined as: 

- _ J e Hf3 ^°^ if wi,W2 is legal with Wq 3 ~ W1W2 ^rt 
0, otherwise. ' [ ' 



Perron-Frobenius theorem 

From the matrix C the transition matrix of a Markov chain can be constructed. 
Since 'Hp(iO^ > ) > — oo, e^* 3 ^ ' > for each legal transition. As a consequence 
of the Perron-Frobenius theorem [2Tll4"B] . C has a unique real positive eigenvalue 
sp, strictly larger than the modulus of the other eigenvalues (with a positive 
gap), and with associated right, R, and left, L, eigenvectors: CR = spR, LC = 
spL. 

The following holds: 

5 Two potentials are said "equivalent" or cohomologous if and only if they correspond to 
the same Gibbs distribution I26| . 
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These eigenvectors have strictly positive entries R ( . ) > 0, L ( . ) > 0, 
functions of blocks of range D. They can be chosen so that the scalar 
product (L,R) = 1. 

We have: 

V(Hp) =log S(3 . (16) 

The following potential: 

4>{cj^)=Up(u^)-g p (i^) (17) 

with: 

Qp K D ) = log R ( c^ 1 ) - log R ( wf ) + log S/J , (18) 

is equivalent to Hp and normalized. It defines a family of transition prob- 
abilities: 

V[u(D) \^- 1 ] d ^ e^S) >0 . (19) 

These transition probabilities define a Markov chain which admits a unique 
invariant probability: 

l x{u°- l )=R{u%- 1 )L(u°- 1 ). (20) 

which is the Gibbs distribution satisfying the variational principle (|10p . 

It follows that the probability of blocks of depth n > D is: 

A* [w? ] = K-d+i ) £ K" 1 ) ■ (21) 

In the case Z? = the Gibbs distribution reduces to (TH| . One can indeed 
easily show that: 

Additionally, since spike patterns occurring at distinct time are indepen- 
dent in the D = case, Z n in (fTU)) can be written as Z n = Za so that 
V(Hp) =\og Zp. 

In the general case of spatio-temporal constraints, the normalization re- 
quires the consideration of normalizing function Qp depending as well on 
the blocks uiq . Thus, in addition to function "Hp normalization intro- 
duces a second function of spike blocks. This increases consequently the 
complexity of Gibbs potentials and Gibbs distributions compared to the 
spatial (D = 0) case where Qp reduces to a constant. 
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3.1.7 Examples 

We give here a few examples of Maximum Entropy Gibbs distributions, found 
in the literature. 

• Bernoulli model. Here only firing rates of neurons are constrained. The 
potential has the form: 

JY 

W /3 (w(0)) = 5^Awi(0). 

i=l 

This is a memory-less model, where transitions probabilities are given by 
neuron firing rates Xi = y+^W- The Gibbs distribution has the form: 

n N 

mkj= n n ^'"(i-aj) 1 -"'!') (22) 

This is thus a product probability where neurons are independent. 

• Ising model. This model was introduced by Schneidman et al |47j for 
retina spike train analysis. Here, firing rates and instantaneous pairwise 
synchronisation probabilities are constrained. The potential has the form: 

N N 
i=l »,i=l 

This is a memory-less model where the Gibbs distribution has the classical 
form (fI3|), 

• Extended spatial Ising model. A natural extension of Ising model has 
been proposed by Ganmor et al [19j . where triplets and more general 
synchronous spike events are considered. The potential has the form: 

N N N 

H (3 (u(0))=^ftw i (0)+^ j 8yw i (0)w i (0)+ 51 AifcWi(O) a> J (0)w fc (0)+. 

t=l i}j=l 

This is a memory-less model where the Gibbs distribution has the classical 
form (JTI|). 

• Spatio temporal Ising model. In [31j Marre et al considered a spatio- 
temporal extension of the Ising model where the potential has the form: 

N N 

Here spatio-temporal pairs with memory depth 1 are considered. Although 
the Gibbs distribution has not the form (|14[). the authors use an approxi- 
mation of the exact distribution by this form, based on a detailed balance 
assumption. They applied this model for spike train analysis in the cat 
parietal cortex. 

• General Spatio temporal model. General models of the form ([9]) have been 
considered in [551 IS! EH] for the analysis of retina spike trains. A C++ 
implementation of methods for fitting spatio-temporal models from data 
is available at http : //enas . gf orge . inria. f r/v3/| 
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3.1.8 Applications 

The maximum entropy principle has been used by several authors \$T\ |2H1 [5J3 [55J 
[35J HH EH HD] for Multi-electrode Arrays (MEA) spike train analysis. Efficient 
methods have been designed to estimate the parameters of the potential, in the 
spatial case [T7] (Broderick et al., 2007) and in the spatio-temporal case [M] . 

This approach, grounded on statistical physics, attempts to find a generic 
model for spike statistics based on a potential of the form where the observ- 
ables and their related (3 parameters summarize "effective interactions" between 
spikes. Behind this approach exists, we believe, a physicists "dream" : inferring, 
from data analysis, the equivalent of the equation of states existing in thermo- 
dynamics; that is, summarizing the behaviour of a big neuronal system by a few 
canonical variables (analogous e.g. to temperature, pressure, volume in a gas). 
To our opinion, recent remarkable investigations to exhibit critical phenomena 
in retina spike train statistics are part of this project (Tkacic et al., 2006,2009) 

The main advantage of this approach is the possibility of constructing dif- 
ferent statistical models based on a priori hypotheses on the most statistically 
significant events (single spikes, pairs, triplets, and so on). As such, it allows 
to consider arbitrary forms of spatio-temporal correlations. But this strength is 
also a weakness. Indeed, the possible forms of potentials are virtually infinite 
and obviously, in the setting of neuronal dynamics, one does not have the equiv- 
alent of mechanics or thermodynamics to construct the potential from general 
principles. 

Finally, this approach only holds for stationary data, a highly questionable 
assumption as far as data from living systems are concerned. 

3.2 Generalized Linear model 

We now consider a second class of Gibbs distributions related to statistical mod- 
els called Linear-Nonlinear (LN) model and Generalized Linear Model (GLM) 
HIMllSniinZllMlSlSnillJSI]- We focus here on the GLM and follow the 
presentation of Ahmadian et al [T] . 

3.2.1 Conditional intensities 

GLMs are commonly used statistical methods for modeling the relationship 
between neural population activity and presented stimuli. Let x = x(t) be 
a time-dependent stimulus. In response to x the network emits a spike train 
response r. This response does not only depend on x, but also on the network 
history of spiking activity. The GLM (and LN) assimilate the spike response 
r as an inhomogencous point process: the probability that neuron k emits a 
spike between t and t + dt is given by Afc(i | H t ) dt, where Xk{t | H t ) is called 
"conditional intensity" and H t is the history of spiking activity up to time t. In 
the GLM this function is given by: 




(23) 



where: 
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• / is a non linear function (an exponential or a sigmoid); 



• bk is some constant fixing the baseline firing rate of neuron k; 

• K k is a causal, time-translation invariant, linear convolution kernel that 
mimics a linear receptive field of neuron k; 

• * is the convolution product; 

• Hkj is the memory kernel that describes possible excitatory or inhibitory 
post spike effects of the j th observed neuron on the k th . As such, it 
depends on the past spikes, hence on u. The diagonal components H k k 
describe the post spike feedback of the neuron to itself, and can account 
for refractoriness, adaptation and burstiness depending on their shape; 



time of the r th spike of j th neuron. 



rj is the spike train of neuron j: rj(t) = X) r >i — )> wnere tj is the 



The spike response has a history dependent structure that makes Poisson 
models inappropriate. Point processes affords for history dependence and gen- 
eralizes Poisson process. A point process can be completely characterized by its 
conditional intensity function. 

w , , _ , P(AjV [t+At )=i I H t ) 

where iV[ t+ At) is the counting process that gives the number of spikes occurring 
in the interval [t + At). Choosing At to be a sufficiently small time interval 
~ 1ms, the probability of firing more than one spike is negligibly small compared 
to the probability of firing one spike. This assumption is biophysically plausible 
because neurons have refractory period. Therefore: 

P(spikc in [t + At) \ E t ) « X k (t \ H t )At. 

Here Xk(t H t ) is defined in continuous time, and spikes are discrete events. 
If we discretize the time to make the spikes emitted by the point process belong 
to a single bin, we have: 

P(w fc (n) = 1 | H„_i) P3 X k (n \ H„_ x )Af := p k (n) 

3.2.2 Conditional independence 

The GLM postulates that, given the history H and stimulus x, neurons are 
independent (conditional independence upon the past and stimulus). In the 
context of transition probabilities defined on section 12.21 the response at time 
n is a spiking pattern uj(n) while the history is the spike activity H. As a 
consequence of the conditional independence assumption the probability of a 
spike pattern follows a Bernoulli process: 

N 

P n [w(n) l^" 1 ] = l[pk(nr^ n) (l- P k(n)) 1 - u ^ n) . (24) 
fe=i 
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3.2.3 Gibbs distribution 

Transition probabilities are strictly positive whenever < Pk{n) < 1, for all k,n. 
If / is e.g. a sigmoid this holds provided its argument bi + (Ki*x)(t) + J^j {Hij * 
Tj)(t) remains bounded in absolute value. The continuity of A with respect to 
u holds whenever / is continuous and the memory kernel H is continuous with 
respect to u>. This second condition is fulfilled in two cases: 

• H depends on a finite past; 

• H depends on an infinite past, but the memory dependence decays suffi- 
ciently fast to ensure continuity. Since H mimics synaptic influence it is 
typically a sum of a-profiles that mimic PSPs (Post Synaptic Potentials). 
a profiles decay exponentially fast with time, so they match this condition. 
We come back to this point in section 1331 

The Gibbs potential associated with (J^U) is: 

N 

4> n {uS) = ^2(uj k (n)\ogp k (n) + (1 - w fe (n))(l -Pfe(n)) ) , (25) 

k=l 

It is normalized by definition. 

3.2.4 Applications 

This model has been applied in a wide variety of experimental settings [51 1131 1531 
IH1 [Ml (Ml ED] ■ Efficient methods has been designed to estimate the parameters 

&■ 

To us, the main advantages of the GLM are: 

• The transition probability is known (postulated) from the beginning and 
does not require the heavy normalization (|17D imposed by potentials of 
the form ©; 

• The model parameters have a neurophysiological interpretation, and their 
number grows at most as a power law in the number of neurons. 

• It has good decoding performances 

• It holds for non stationary data. 
Its main drawbacks are: 

• It postulates an ad hoc form for the transition probability of the stochastic 
process; 

• It uses a quite questionable assumption of conditional independence: neu- 
rons are assumed independent at time n when the past is given. On the 
opposite, the maximal entropy principle does not require this assumption. 

• To us, the biophysical interpretation of the parameters Hkj is unclear. Do 
they correspond to "real" connectivity ? "functional" connectivity ? 
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3.3 Integrate and Fire neural networks 

The previous examples were mainly developed for data analysis: one speculates 
a form for transitions probabilities, performs parameters fitting, and then uses 
the model to decode or to extrapolate the statistics of complex events. Here we 
start from a different point of view asking the following questions: Can we have 
a reasonable idea of what could be the spike train statistics studying a neural 
network model? Do Gibbs distribution arise in these models ? What is the 
shape of the potential ? We focus here on a model proposed in [JJ 151 H3] where 
these questions have been answered. 

3.3.1 Model 

The integrate-and-fire model remains one of the most ubiquitous model for sim- 
ulating and analyzing the dynamics of neuronal circuits. Despite its simplified 
nature, it captures some of the essential features of neuronal dynamics. Denote 
V(t) the membrane potential vector with entries Vk(t). The continuous-time 
dynamics of V(t) is defined as follows. Fix a real variable 9 > called "firing 
threshold" . For a fixed time t, we have two possibilities: 

1. Either Vk(t) < 9, V7c = 1, ...,N. This corresponds to sub-threshold dy- 
namics. 

2. Or, 3k, Vk(t) > 9. Then, we speak of firing dynamics. 

The model proposed here is an extension of the conductance based Integrate- 
and-Fire neuron model introduced in [55] . The model-definition follows the pre- 
sentation given in (TTJ [8] . Neurons are considered as points, with neither spatial 
extension nor biophysical structure (axon, soma, dendrites). Dynamics is ruled 
by a set of stochastic differential equations where parameters, corresponding to 
chemical conductances, depend on the action potentials emitted in the past by 
the neurons. In this way, the dynamical system defined here is ruled both by 
continuous and discrete time dynamical variables. 

Subthreshold dynamics 

It is defined by: 

9L,k(Vk - E L ) -Y^gkjfruXVk - Ej) + J2W(Vj ~ V k ) + h(t), 
j j 

(26) 

• Cfc is the membrane capacity of neuron fc; 

• Ik (t) = ife eX '' ) {t)+o~B £fc (t) is a current when a time-dependent part %^ xt ^ (f ) 
(stimulus) and stochastic part o~B£,k{t) where is a white noise and 
<jb controls the noise intensity; 

• 9L.k is the leak conductance and El < the leak Nernst potential; 

• ~g~k~j mimics electric conductance (gap junctions) between neurons j and fc; 
these are passive and symmetric conductances; 



Ck ^T 



where: 
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• the term 

g kj (t,oj) = G kj ]T a kj (t-t<r\w)), (27) 

mimics the conductance of the chemical synapse j — > k, where: 

a kj (t) = —e~^H(t), (28) 

Tkj 

mimics a PSP, H{t) is the Heaviside function (that mimics causality); 
tj {u>) is the r th spike emitted by neuron j in the raster u>, therefore 
g k j(t,uj) depends on the whole spike history; Ej is the reversal potential 
of the chemical synapse j — > k. 

Firing dynamics and reset 

If, at time t, some neuron k reaches its firing threshold 9, V k (t) = 8, then this 
neuron emits a spike. To conciliate the continuous time dynamics of membrane 
potentials and the discrete time dynamics of spikes we define the spike and reset 
as follows. 

• The neuron membrane potential V k is reset to at the next integer time 
after t. 

• A spike is registered at time [t] + 1 where [t] is the integer part of t. This 
allows us to represent spike trains as events on a discrete time grid. It 
has the drawback of artificially synchronizing spikes coming from different 
neurons, in the deterministic case [TTJ[28]. However, the presence of noise 
in membrane potential dynamics eliminates this synchronization effect. 

• Spikes are separated by a time scale r sep > which is a multiple of 6 (thus 
an integer). 

• Between [t] + 1 and [t] + r sep the membrane potential V k is maintained 
to (refractory period). From time [t] + r sep on, V k evolves according to 
(l2"6l) until the next spike. 

• When the spike occurs (at time [t] + 1), the raster uj as well conductances 
g k j(t,uj) are updated. 

3.3.2 Main results 

This model has several variants: discrete time [7]; continuous time with chemical 
synapses [5] and continuous time with chemical and electric synapses |14j . We 
list here the main results concerning spike statistics and Gibbs distributions. 

1. Whatever the values of the parameters the model admits a unique Gibbs 
distribution in the general sense given in section f2. 31 

2. When the noise is weak and without gap junctions, the normalized Gibbs 
potential can be explicitly computed. It takes the form: 

N 

0„(w) = 2 (w fc (n) log Afc(n) + (1 - w fc (n) log(l - A fc (n)) ) , (29) 
fc=i 
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where 

Xk(n) = f ( b k (n - 1, u) + ^ (n - 1, w) + $^" ) (n - 1, w) ) , (30) 
where: 

• / is a sigmoid function; 

• bk{n — l,w) is a function depending on the threshold value, the leak 
Nernst potential, and on the integrated noise, integrated from the 
last time where has been reset (depending on uS) up to time n — 1; 

• $t ' xt \n — l,w) corresponds to the integrated effects of the external 
current i^ xt ^ on the membrane potential; 

• the term $i* vn '(n — l,w) corresponds to the integrated effects of 
chemical synapses on the membrane potential. 

3. Eq. (|29p . expresses that in this case, neurons are conditionally independent 
upon the past. 

4. In this conductance-based model, conductances depend on the past via 
(|27|) . One can consider as well a current-based model where conductances 
are fixed and current depend on the past spikes. In this case, the terms 
$i ext) (n - l,w) and n) (n - l,w) can be written as convolutions and 
one recovers a potential with a form analogous to (|25)1 . 

5. In the general case (gap junctions) , neurons are not conditionally inde- 
pendent. Gap junctions induce a coupling effect which does not allow any 
more the factorization (|2"5]l of the potential. 

6. Correlations (pairwise and higher order) are mainly due to chemical synapses 
and gap junctions. Additional correlations can also be induced by the 
stimulus using e.g. a current i( ext ~) where time fluctuations of i£ xt ' are cor- 
related with i{ ext ' , But these are extra-correlations that disappear when 
the stimulus is removed, whereas the dynamical correlations remains. 

7. The potential has an infinite range (infinite memory). However, thanks to 
the exponential decay of the alpha profile, one can show that the potential 
is continuous. This allows to propose Markovian approximation of the 
Gibbs distribution where the exact potential is replaced by a potential 
with a finite range [7J |5] ■ 

3.3.3 Applications 

What do we finally learn from the study of this model ? 

• We have a positive answer to the existence of Gibbs distributions in neural 
networks models. 

• An explicit form for the potential is known in specific cases. 

• The form ()30l) actually also fits with maximum entropy model, in the 
stationary case, as shown in section 13.41 
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• In this model the origin of correlations is essentially due to dynamics, not 
to the stimulus. 



• Gap junctions play here a central role in the structure of dynamical cor- 
relations and dependence of dynamics upon history (see [14] for more 
details). 

• The analysis holds for non stationary data. 

Considering potential uses of this study to fit real data the main criticism 

is: 

• This is a model. Is it sufficient to describe real neural networks ? For 
example, its application to retina data is controversial since it considers 
only spiking cells (that mimics ganglion cells), but retina has also non 
firing cells like most amacrine and bipolar cells. 

• In the general case there is no explicit form for the Gibbs potential. 

• Even when there exists an explicit form for the potential, it has quite a 
lot of parameters which can be difficult to fit from data. 

3.4 Relations between these approaches 

In this section, we establish a connection between the three examples of Gibbs 
distributions considered in sections 13.11 13.21 13.31 

Consider a family of transition probabilities satisfying the positivity condi- 
tion (ii) in section f2. 31 where we furthermore assume that the memory depth is 
finite and that transition probabilities are time-translation invariant. As stated 
in section 12.41 this define an homogenous Markov chain. The transition prob- 
abilities are thus functions of blocks u>q (see section 12.21 and the definition of 
time translation invariance). These functions can then take at most 2 Ar ' D+1 ) 
values. The same holds for the normalized potential Now, one can prove 
that any such function can be written as: 



with L = 2 N( - D+ V- 1 and O, is an observable (|8]) where the time index ranges 
from to D. The index I parametrizes an enumeration of all possible observables 
with N neurons and D + 1 time steps, where mo is the constant observable 
Oq = 1, and so on. 

Now, using the positivity condition and the results in section 13.11 one can 
show that any family of stationary transition probabilities with memory depth D 
can be associated with a potential of the form ©. The correspondence is actu- 
ally unique. This is a straightforward application of the celebrated Hammersley- 
Clifford theorem [H 13 133 13S]- 

An immediate consequence of this result is that, in the stationary case with 
finite memory , the GLM potential (f!?3")) and the Integrate and Fire (|29l) cor- 
respond to a Maximum Entropy model with a potential of the form @. The 
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(31) 
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parameters /3k in © are then nonlinear functions of the parameters in ([2~5jl or in 
([25)1 (see [7]). As a consequence, some of these parameters are redundant: there 
are a priori 2 N ^ D+1 ^ non vanishing parameters /3k while there are quite less pa- 
rameters in the GLM or in the Integrate and Fire (of order N 2 ). However, the 
GLM assumes conditional independence between neurons, while the maximum 
entropy approach is precisely used to take care of (pairwise and higher order) 
correlations between neurons. In this sense it is more general. 

In the non stationary case one can no more apply the maximum entropy 
principle (entropy is not defined). However, in the case where statistics depends 
on time on a slow time scale (compared to spike characteristic time scale) one 
can use a quasi-static approach where the parameters (3k in ([9]) vary slowly in 
time [U] (Tyrcha et al., 2012). On the opposite, the GLM allows to consider 
non stationary data with efficient results [TJ 2T] . 

The IF model contains both maximum entropy models and GLM. It has a 
maximum entropy Gibbs distribution in the stationary case, and it reduces to 
GLM upon several simplifications. In its more general form it allows the consid- 
eration of non stationarity and does not rely on the conditional independence 
assumption. Unfortunately, its generality is a weakness since an explicit form 
for the potential is not known yet in the general case. 

4 Conclusion 

In this paper we have argued that Gibbs distribution considered in a fairly 
general sense could constitute generic statistical models to fit spike trains data. 
The example of the Integrate and Fire model suggests that such distribution 
could be also defined for more elaborated neural networks models (FitzHugh- 
Nagumo or Hodgkin- Huxley). In particular, the existence and uniqueness of 
a Gibbs measure holds whenever there is continuity with respect to a raster, 
with a sufficiently fast decay of the variation ^ [IE] . As shown in [71 [5J Q3] 
this property is ensured when interactions between neurons decay exponentially 
fast. This is typically the case for chemical synapses where the PSP ([2"8)> decays 
exponentially fast with time. 

The interest of proposing Gibbs distribution constructed from neural net- 
work models is multiple. The model mimics a ncurophysiological structure 
where interactions between neurons, stimuli, and biophysical parameters are 
well identified. As a consequence the model-parameters can be easily inter- 
preted. Thus, the role of each specific biophysical parameter on spike statistics 
can be easily analysed. Also, the potential obtained this way is already nor- 
malized, while e.g. maximum entropy principle requires a complex procedure to 
achieve normalisation. Finally, in this context, it is possible to study the effect 
of a time-dependent stimulus on spike statistics. 

However, this approach, to be efficient requires (i) to have an analytical 
form for the potential; (ii) to be able to fit the many parameters of a non linear 
problem. This is yet far from being achieved. 
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